Locality-Sensitive Hashing Without False Negatives for l_p
نویسندگان
چکیده
In this paper, we show a construction of locality-sensitive hash functions without false negatives, i.e., which ensure collision for every pair of points within a given radius R in d dimensional space equipped with lp norm when p ∈ [1,∞]. Furthermore, we show how to use these hash functions to solve the c-approximate nearest neighbor search problem without false negatives. Namely, if there is a point at distance R, we will certainly report it and points at distance greater than cR will not be reported for c = Ω( √ d, d 1− 1 p ). The constructed algorithms work: • with preprocessing time O(n log(n)) and sublinear expected query time, • with preprocessing time O(poly(n)) and expected query time O(log(n)). Our paper reports progress on answering the open problem presented by Pagh [8], who considered the nearest neighbor search without false negatives for the Hamming distance.
منابع مشابه
MapReduce Based Personalized Locality Sensitive Hashing for Similarity Joins on Large Scale Data
Locality Sensitive Hashing (LSH) has been proposed as an efficient technique for similarity joins for high dimensional data. The efficiency and approximation rate of LSH depend on the number of generated false positive instances and false negative instances. In many domains, reducing the number of false positives is crucial. Furthermore, in some application scenarios, balancing false positives ...
متن کاملLocality-sensitive Hashing without False Negatives
We consider a new construction of locality-sensitive hash functions for Hamming space that is covering in the sense that is it guaranteed to produce a collision for every pair of vectors within a given radius r. The construction is efficient in the sense that the expected number of hash collisions between vectors at distance cr, for a given c > 1, comes close to that of the best possible data i...
متن کاملOn fast bounded locality sensitive hashing
In this paper, we examine the hash functions expressed as scalar products, i.e., f(x) =< v, x >, for some bounded random vector v. Such hash functions have numerous applications, but often there is a need to optimize the choice of the distribution of v. In the present work, we focus on so-called anti-concentration bounds, i.e. the upper bounds of P [| < v, x > | < α]. In many applications, v is...
متن کاملFast indexing strategies for robust image hashes
Similarity preserving hashing can aid forensic investigations by providing means to recognize known content and modified versions of known content. However, this raises the need for efficient indexing strategies which support the similarity search. We present and evaluate two indexing strategies for robust image hashes created by the ForBild tool. These strategies are based on generic indexing ...
متن کاملHyperplane Arrangements and Locality-Sensitive Hashing with Lift
Locality-sensitive hashing converts high-dimensional feature vectors, such as image and speech, into bit arrays and allows high-speed similarity calculation with the Hamming distance. There is a hashing scheme that maps feature vectors to bit arrays depending on the signs of the inner products between feature vectors and the normal vectors of hyperplanes placed in the feature space. This hashin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016